Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier
نویسندگان
چکیده
The recognition of cursive script is regarded as a subtle task in optical character recognition due to its varied representation. Every cursive script has different nature and associated challenges. As Urdu is one of cursive language that is derived from Arabic script, that’s why it nearly shares the same challenges and difficulties even more harder. We can categorized Urdu and Arabic language on basis of its script they use. Urdu is mostly written in Nasta’liq style whereas, Arabic follows Naskh style of writing. This paper presents new and comprehensive Urdu handwritten offline database name UrduNastaliq Handwritten Dataset (UNHD). Currently, there is no standard and comprehensive Urdu handwritten dataset available publicly for researchers. The acquired dataset covers commonly used ligatures that were written by 500 writers with their natural handwriting on A4 size paper. We performed experiments using recurrent neural networks and reported a significant accuracy for handwritten Urdu character recognition.
منابع مشابه
Holistic Approach for Urdu Character Recognition Using Modified Hmm
Automatic recognition of cursive handwritten script remains a challenging problem even with the promising improvement in classifier and computational power. Segmentation based approach for recognition of handwritten Urdu script has considerable computational overhead and has lower accuracy as compared to Roman and Chinese script due to additional segmentation error. Presence of complimentary ch...
متن کاملHandwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method
A recurrent neural network (RNN) has been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nastaleeq scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have pr...
متن کاملUCOM offline dataset-an urdu handwritten dataset generation
A benchmark database for character recognition is an essential part for efficient and robust development. Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition. In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Ur...
متن کاملAutomatic Recognition of Offline Handwritten Urdu Digits In Unconstrained Environment Using Daubechies Wavelet Transforms
This paper presents an optical character recognition system for the handwritten Urdu Digits. A lot of work has been done in recognition of characters and numerals of various languages like Devanagari, English, Chinese, and Arabic etc. But in case of handwritten Urdu Digits very less work has been reported. Different Daubechies Wavelet transforms are used in this work for feature extraction. Als...
متن کاملMulti-font Numerals Recognition for Urdu Script based Languages
Handwritten character recognition of Urdu script based languages is one of the most difficult task due to complexities of the script. Urdu script based languages has not received much attestation even this script is used more than 1/6th of the population. The complexities in the script makes more complicated the recognition process. The problem in handwritten numeral recognition is the shape si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.05455 شماره
صفحات -
تاریخ انتشار 2017